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Abstract 

We study a new variant of colored orthogonal range searching prob- 
lem: given a query rectangle Q all colors c, such that at least a fraction 
r of all points in Q are of color c, must be reported. We describe sev- 
eral data structures for that problem that use pseudo-linear space and 
answer queries in poly-logarithmic time. 



1 Introduction 

The colored range reporting problem is a variant of the range searching 
problem in which every point p G P is assigned a color c G C. The set 
of points P is pre-processed in the data structure so that for any given 
rectangle Q all distinct colors of points in Q can be reported efficiently. In 
this paper we consider a variant of this extensively studied problem in which 
only frequently occurring colors must be reported. 

We say that a color c E C r-dominates rectangle Q if at least a r-fraction 
of points in Q are of that color: \{p £ PnQ \ col(p) = c}| < T\Pr]Q\, where 
col(p) denotes the color of point p. We consider several data structures that 
allow us to report colors that dominate Q 0. 

Motivation Standard colored range reporting problem arises in many ap- 
plications. Consider a database in which every object is characterized by 
several numerical values (point coordinates) and some attribute (color). For 
instance the company database may contain information about age and 
salary of each employee. The attribute associated with each employee is 
her position. The query consists in reporting all different job types for all 
employees with salary between 40.000 and 60.000 who are older than 40 and 
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younger than 60 years old. Colored range reporting also occurs naturally 
in computational biology applications: each amino acid is associated with 
certain attributes (hydrophobic, charged, etc.). We may want to report 
different attributes associated with amino acids in certain range |10j . 

However, in certain applications we are not interested in all attributes 
that occur in the query range. Instead, we may be interested in reporting 
the typical attributes. For instance, in the first example above we may wish 
to know all job types, such that at least a fraction r of all employees with a 
given salary and age range have a job of this type. In this paper we describe 
data structures that support such and similar queries. 

Related Work. Traditional colored range reported queries can be effi- 
ciently answered in one, two, and three dimensions. There are data struc- 
tures that use pseudo-linear space and answer one- and two-dimensional 
colored range reporting queries in 0(logn + k) time [7], [H] and three- 
dimensional colored queries in 0(log^ n + k) time [7], where k is the number 
of colors. A semi-dynamic data structure of Gupta et al. [7] supports two- 
dimensional queries in 0(log^ n + k) time and insertions in 0(log^ n) amor- 
tized time. Colored orthogonal range reporting queries in d dimensions can 
be answered in 0(logn + fe) time with a data structure that uses 0((n^+^)) 
space [1], but no efficient pseudo- linear space data structure is known for 
d > 3. 

De Berg and Haverkort [4] consider a variant of the colored range search- 
ing in which only significant colors must be reported. A color c is significant 
in rectangle Q if at least a fraction r of points of that color belong to Q, 
|{p G QnP I col(p) = c}| > t\{p G P 1 col(p) = c}|. For d = 1, de Berg 
and Haverkort [4j describe a linear space data structure that answers queries 
in 0(logn + k) time, where k is the number of signficant colors. For d > 2 
signficant queries can be answered approximately: in 0(logn + k) time we 
can report all a set of colors such that each color is (1 — e)r-significant for 
a fixed constant e and all r-significant colors are reported. The only known 
data structure that efficiently answers exact significance queries uses cubic 
space [3]. 

The problem of finding the elements occurring at least rn times in a stream 
of data was studied in the context of streaming algorithms [11] , [6] , [9] . It is 
possible to find all elements that occur at least rn times in the multi-set of n 
elements with an algorithm that 0(1/t) space and with two passes through 
the data [H], [6], [9]. However, any algorithm that performs only one pass 
through the data must use 17 (m log ^) bits of space, where m is the number 
of different elements (colors in our terminology) in the multi-set [9]. 
Our Results In this paper we show that we can find domination colors 
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in an arbitrary d-dimensional rectangle in poly-logarithmic time using a 
pseudo-linear space data structure. 

• We describe a static 0{Tn) space data structure that supports one- 
dimensional queries in 0(t log n log log n) time. A static 0{Tn log log n) 
space data structure supports one-dimensional domination queries in 
0(r log n) time. 

• In the case when all coordinates are integers bounded by U, there 
is a 0{Tn) space static data structure that supports one-dimensional 
domination queries in 0(t log log n log log [/) time 

• There is a dynamic 0(rn) space data structure that supports one- 
dimensional domination queries and insertions in 0(t log n) time and 
deletions in 0(r log n) amortized time. We can reduce the update time 
to (amortized) O(logn) by increasing the space usage to 0(Tn log n) 

• There is a data structure that supports domination queries in d di- 
mensions in 0(t log'^n) time and uses 0(Tn log'^""^ n) space 

• There is a dynamic data structure that answers domination queries 
in d dimensions in 0(r log''"''^ n) time, uses 0(rn log"^"^ n) space, and 
supports insertions in 0(r log'^"''^ n) time and deletions in 0(r log'^"''^ n) 
amortized time 

We describe static and dynamic data structures for one-dimensional dom- 
ination queries in sections [2] and El Data structures for multi-dimensional 
domination queries are described in section HI 

2 Static Domination Queries in One Dimension 

The following simple property plays an important role in all data structures 
for domination queries. 

Observation 1 If Q = Qi U Q2, Qi H Q2 = 0; CLnd color c is dominant in 
Q, then either c is dominant in Qi or c is dominant in Q2. 

Due to this property a query on a set Q can be decomposed into queries on 
some disjoint sets Qi, ■ ■ ■ ,Qp such that UQi = Q and p is a constant: we 
find the dominating colors for each Qi and for each color c that dominates 
some Qi we determine whether c dominates Q by a range counting query. 

Our data structure is based on the same approach as exponential search 
trees [2]. Let P be the set of all points. In one-dimensional case we do not 
distinguish between a point and its coordinate. P is divided into /3„ intervals 
Ii, . . . , Ip^ so that each Pi = POli contains between and 2n^/^ points 
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and l3n = G(n^/^). Let li and denote the left and right bounds of interval 
li. For each 1 < i < j < (3, the list Lij contains the set of colors that 
dominate [li,rj]. We denote by riij the total number of points in 

Each interval li is recursively subdivided in the same manner: an interval 
that contains m points is divided into (3^ subintervals and each subinterval 
contains between and 2m^/^ points. If some interval Ij is divided 

into Ij^i, . . . , Ij^p, then we say that Ij is a parent of Ij^i [Ij^i is a child of Ij). 
The tree T reflects the division of intervals into sub-intervals: each tree node 
u corresponds to an interval I„ and a node ti is a child of v if and only if 
is a child of 1^ . The root of T corresponds to P and leaves of T correspond 
to points of P. The node of depth i contains n^'^/^^' points. Hence, the node 
of depth logs logn contains Oil) points and the height of T is O(loglogn). 

2 

For every color c, we also store all points of color c in a data structure that 
supports range counting queries. 

Consider a query Q = [a,b]. Let and 1^ be the leaves of T in which 
a and b are stored, and let q be the lowest common ancestor of la and It,. 
The search procedure visits all nodes on the path from to q {It, to q); 
for each visited node u we construct the set of colors Su, such that every 
c G Su dominates (1 [a,b]. We also compute the total number of points 
in lu n [a,b]. Let u be the currently visited node of T situated between 
lb and q, and suppose that the node v visited immediately before u is the 
(i + l)-st child of u. Due to Observation [1] only colors stored in Lu and Sy 
may dominate luf^Q- For each color c in Lu U we count how many times 
it occurs in n Q using the range counting data structure for that color. 
Thus we can construct Su by answering at most 2r counting queries. Nodes 
between la and q are processed in the same way. Finally, we examine all 
colors in sets Sp and Sr and list Lij of the node q, where p and r are nodes 
on the paths from q to la and lb respectively, p is the i-th child of q, and 
r is the j-th child of q. The search procedure visits O(loglogn) nodes and 
answers 0(r log logn) counting queries. Hence, queries can be answered in 
0(logn log logn) time. 

If an interval / contains m points, then all lists Lij contain 0{m?/^) ele- 
ments. Data structures for range counting queries use 0{n) space. Therefore 
the space usage of our data structure is 0{n). 

We can reduce the query time to O(logn) by storing range counting 
data structures for each interval: for every interval lu and every color c, 
such that {p G P n I col(p) = c} 7^ 0, we store a data structure that 
supports range counting queries in time 0{\og\Iu\)- The total number of 
colors in all intervals lu for all nodes u situated on the same level of tree T 
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does not exceed the number of points in P. Therefore the total number of 
elements in all range counting data structures is O(nloglogn). The query 
is processed in the same way as described above. We must answer 0(r) 
counting queries on Iq, 0{t) range counting queries on children of Iq, 0{t) 
range counting queries on children of children of Iq, etc. Therefore the query 
time is 0(r(log(|7,|)+log(|/,|2/3)+log(|/^|4/9)+. . .)) = 0(r E(2/3)Mogn) = 
0(t log n). 

We obtain the following result 

Theorem 1 There exists a 0(Tn log log n) space data structure that sup- 
ports one- dimensional domination queries in 0(t log n) time. There ex- 
ists a 0(rn) space data structure that supports one- dimensional domination 
queries in 0(r log n log log n) time. 

In the case when all point coordinates are integers bounded by a parame- 
ter U we can easily answer one-dimensional counting queries in 0(loglog[/) 
time. As shown above, a domination query can be answered by answering 
0(t log log n) counting queries; hence, the query time is 0(r log log n log log U) 
Since it is not necessary to store range counting data structures for each in- 
terval, all range counting data structures use 0{n) space. 

Theorem 2 There exists a 0{Tn) space da,ta structure that supports one- 
dimensional domination queries in O (r log log f/ log log n) time. 

3 Dynamic Domination Queries in One Dimension 

Let T be a binary tree on the set of all p E P. With every internal node v we 
associate a range rng{v) = [l^,r^), where l^ is the leftmost leaf descendant 
of V and is the leaf that follows the rightmost leaf descendant oi v. T is 
implemented as a balanced binary tree, so that insertions and deletions are 
supported in O(logn) time and the tree height is O(logn). In each node 
V we store the number of its leaf descendants, and the list L,,; Ly contains 
all colors that dominate rng{v). For every color c in we also maintain 
the number of points of color c that belong to rng{v). For each color c 
there is also a data structure that stores all points of color c and supports 
one-dimensional range counting queries. 

A query Q = [a, h] is answered by traversing the paths from la to q and 
from Z(, to where la and lb are the leaves that contain a and h respectively, 
and q is the lowest common ancestor of a and h. As in the previous section, 
in every visited node u the search procedure constructs the set of colors S'„, 
such that every c & Su dominates rng{v) fl [a, b]. Suppose that a node v on 
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the path from to q is visited and let u be the child of v that is also on the 
path from /{, to q. If u is the left child of v, then rng{v)n[a, b] = rng{u)n[a, b] 
and = Su- If u is the right child of v, then rng{v) n [a, 6] = rng{w) U 
{rng{u) fl [a, 5]) where w is a sibling of u. Colors that dominate rng{w) are 
stored in L^; we know colors that dominate {rng{u) fl [a,h\) because u was 
visited before v and Su is already constructed. Hence, we can construct 
by examining each color c G L^, U and answering the counting query for 
each color. Since one-dimensional dynamic range counting can be answered 
in O(logn) time, we spend 0(r log n) time in each tree node. Nodes on the 
path from la to q are processed in a symmetric way. Finally we examine the 
colors stored in Sq^ and Sq^, where qi and q2 are the children of and find 
the colors that dominate rng{q) fl [a, 6] = [a, b\. 

When a new element is inserted(deleted), we insert a new leaf I into T 
(remove I from T). For every ancestor v oil, the list Ly is updated. 

After a new point of the color Cp is inserted, the color Cp may dominate 
rng{v) and colors in Ly may cease to dominate rng{v). We may check 
whether Cp must be inserted into and whether some colors c G Ly must be 
removed from Ly by performing at most t + 1 range counting queries. Since 
a new point has O(logra) ancestors, insertions are supported in 0(t log^ra) 
time. 

When a point of color Cp is deleted, we may have to delete the color Cp 
from Ly. Wc can test this by performing one counting query. However, we 
may also have to insert some new color c into Ly because the number of 
points stored in descendants of the node v decreased by one. To implement 
this, we store the set of candidate colors Ly-, Ly contains all colors that 
(r/2)-dominate rng{v). For each color c e L'y wc test whether c became 
a r-dominating color after deletion. When the number of leaf descendants 
of the node v decreased by a factor 2, we re-build the list L'^. If Py is the 
set of leaf descendants of v (that is, points that belong to rng{v)), then we 
can construct the set of distinct colors that occur in Py in 0{\Py \ log(|P^|)) 
time. We can also find the sets of colors that r-dominatc and (T/2)-dominate 
rng{v) in 0{\Py\ log(|P^|)) time. Since we re-build L'^ after a sequence of at 
least \Py/2\ deletions, re-build of some Ly incurs an amortized cost O(logn). 
Every deletions may affect O(logn) ancestors; hence, deletions are supported 
in O(log^n) amortized time. 

We can speed-up the update operations by storing in each tree node u 
the set of distinct colors in P„, denoted by C„. For each color c € C„, we 
store how many times points with color c occur in P^. When a new point p 
is inserted/deleted, we can update Cy for each ancestor t; of p in 0(1) time. 
Using Cy, we can decide whether a given new color must be inserted into 
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Ly in 0(1) time. Using we can also re-build L'^ in 0{\Cv\) = 0(|P^|) 
time. Hence, we can support insertions in 0(r log n) time and deletions in 
O(logn) time with help of lists C^. The total number of elements in all Cy 
is 0(Tn log n). 

Thus we obtain the following 

Theorem 3 There exists a 0{Tn) space data structure that supports one- 
dimensional domination queries and insertions in 0(r log^ n) time and dele- 
tions in O(rlog'^n) amortized time. There exists a 0(rn log n) space data 
structure that supports one- dimensional domination queries and insertions 
in 0(t log n) time and deletions in 0(r log n) amortized time. 

4 Multi-Dimensional Domination Queries 

We can extend our data structures to support d-dimensional queries for an 
arbitrary constant d using the standard range trees pj approach. 

Theorem 4 There exists a 0(n log'^ n) space data structure that supports d- 
dimensional orthogonal range domination queries in ©(log*^"^ n(loglogn)^) 
time. 

We describe how we can construct a d-dimensional data structure if we know 
how to construct a (d — l)-dimensional data structure. A range tree T^ is 
constructed on the set of d-th coordinates of all points. An arbitrary interval 
[ad, bd] can be represented as a union of O(logn) node ranges. Hence, an ar- 
bitrary d-dimensional query Q = Q'^~^ x [ad, bd] can be represented as a union 
of O(logn) queries Qi, . . . , Qt, where t = O(logn) and Qi = Qd-i x rng{vi) 
for some node Vi of T. In each node u of T we store a (d— l)-dimensional data 
structure that contains the first d—1 coordinates of all points whose d-th 
coordinates belong to rng{v). supports modified domination queries in 
d—1 dimensions: for a (d— l)-dimensional query rectangle Q, outputs all 
colors that dominate Q x rng{v). Using D^. we can find (at most r) colors 
that dominate Qi = Q' x rng{v). Since Q is a union of 0(log n) ranges Qi, we 
can identify a set C that contains 0(t log n) candidate colors by answering 
O(logn) modified {d — 1) -dimensional domination queries. As follows from 
Observation [H only a color from C can dominate Q. Hence, we can identify 
all colors that r-dominate Q by answering 0(t log n) d-dimensional range 
counting queries. Thus the query time for d-dimensional queries can be com- 
puted with the formula q{n, d) = 0(log n)q{n, d — 1) + 0{t log n)c{n, d—1), 
where q{n, d) is the query time for d-dimensional domination queries and 
c(n, d) is the query time for d-dimensional counting queries. We can answer 
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d-dimensional range counting queries in 0(log'^~^ n) time and 0(nlog''~^ n) 
space [5]. We can answer one-dimensional domination queries in O(logn) 
time by Theorem [TJ Therefore d-dimensional domination queries can an- 
swered in 0(r log*^?!) time. 

We can apply the reduction to rank space technique [12] , [S] and replace 
all point coordinates with labels from [l,n]. This will increase the query 
time by an additive term O(logn). Since point coordinates are bounded by 
n, we can apply Theorem [2] and answer one-dimensional domination queries 
in 0((log log n)^) time using a 0{n) space data structure. Since the space 
usage grows by a O(logn) factor with each dimension, our data structure 
uses 0(nlog'^~^n) space. 

Theorem 5 There exists a data structure that supports domination queries 
in d dimensions in O(rlog'^n) time and uses 0(nlog'^~^n) space. 

The same range trees approach can be also applied to the dynamic one- 
dimensional data structure for domination queries. Since one-dimensional 
dynamic domination queries can be answered in O(rlog^n) time and dy- 
namic range counting queries can be answered in O(log'^ra) time and 
0{nlog'^~^ n) space, d-dimensional domination queries can be answered in 
(9(log'^+^ n) time, and the space usage is 0(rn log"^""^ n). Since updates are 
supported in O(log^n) (amortized) time in one-dimensional case and up- 
date times grow by O(logn) factor with each dimension, d-dimensional data 
structure supports updates in 0{log'^^^ n) (amortized) time. 

Theorem 6 There is a dynamic data structure that answers domination 
queries in d dimensions in 0{T\og'^~^^ n) time, uses O(rnlog'^-^n) space, 
and supports insertions in 0(r log'^^"^ n) time and deletions in 0(r log'^"''^ n) 
amortized time. 

Conclusion 

We presented data structures for a new variant of colored range reporting 
problem. Our data structures use pseudo-linear space and report all r- 
dominating colors in poly-logarithmic time in the case when the parameter 
r is small, i.e. constant or poly- logarithmic in n. It would be interesting to 
construct efficient data structures for larger values of r. 

Another interesting problem is construction of an efficient data structure 
that finds for an arbitrary given rectangle Q and a (fixed) parameter p, 
the p most frequently occurring colors in the rectangle Q. That is, the 
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data structure must find the set of colors Cp = {ci,...,Cp}, such that 

\{pePr\Q\ col(p) = Ci,CieCp}\>\{pePnQ\ col(p) = c,c^Cp}\ 
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